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Abstract. — Gaussian process models -also called Kriging models— are often used 
as mathematical approximations of expensive experiments. However, the number of 
observation required for building an emulator becomes unrealistic when using classical 
covariance kernels when the dimension of input increases. In oder to get round the 
curse of dimensionality, a popular approach is to consider simplified models such as 
additive models. The ambition of the present work is to give an insight into covariance 
kernels that are well suited for building additive Kriging models and to describe some 
properties of the resulting models. 

Resume. — La modelisation par processus gaussiens -aussi appelce krigeage— est 
souvent utilisee pour obtenir une approximation mathemathique d'une fonction dont 
1'evaluation est couteuse. Cependant, le nombre devaluations necessaires pour con- 
struire un modele base sur des noyaux de covariance usuels devient demesure lorsquc 
la dimension des variables d'entree augmente. Afin de contourner le fleau de la di- 
mension, une alternative bien connue est de considerer des modeles simplifies comme 
les modeles additifs. Nous presentons ici une classe de noyaux de covariance adaptee 
a la construction de modeles de krigeage additifs et nous decrivons certaines proprictc 
des modeles obtenus. 



1. Introduction 

The study of numerical simulators often deals with calculation intensive computer 
codes. This cost implies that the number of evaluations of the numerical simulator is 
limited and thus many methods such as uncertainty propagation, sensitivity analysis, 
or global optimization are unaffordable. A well known approach to circumvent time 
limitations is to replace the numerical simulator by a mathematical approximation 
called metamodel (but also emulator, response surface or surrogate model) based on 
the responses of the simulator for a limited number of inputs called the Design of 
Experiments (DoE). There is a large number of metamodels types and among the 
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most popular we can cite regression, splines, neural networks. In this article, we 
focus on a particular type of metamodel: the Kriging method, more recently referred 
to as Gaussian Process modeling |13| . Originally presented in spatial statistics [3] 
as an optimal linear unbiased predictor of random processes, Kriging has become 
very popular in machine learning, where its interpretation is usually restricted to the 
convenient framework of Gaussian Processes (GP). The latter points of view allows 
the explicit derivation of conditional probability distributions for the response values 
at any point or set of points in the input space. 

Since Kriging is usually based on local basis functions, it requires an increasing 
number of points in the DoE to cover the domain D when the number of dimensions 
d of the input space D C 3R d becomes high [16L |4] . An approach to get around this 
issue is to consider specific features lowering complexity such as the family of Additive 
Models (AM) . In this case, the emulator m can be decomposed as a sum of univariate 
functions: 



where fi £ II and the mj's may be non-linear. Since their introduction by Stones in 
1985 [17] . many methods have been proposed for the estimation of additive models. 
We can cite the method of marginal integration 1121 and a very popular method de- 
scribed by Hastie and Tibshirani in pfl[§]: the GAM backfitting algorithm. However, 
those methods do not consider the probabilistic framework of GP modeling and do 
not usually provide additional information such as the prediction variance. Combin- 
ing the high-dimensional advantages of AMs with the versatility of GPs is the main 
goal of the present work. For the study functions that contain an additive part plus 
a limited number of interactions, details can be found found in a recent article of T. 
Muehlenstaedt [TT] . 

The first part of this paper focuses on the unsuitability of usual separable kernels 
(e.g. power exponential and Matern) for high-dimensional modeling. The second 
part deals with additive Gaussian Processes, their associated kernels and the proper- 
ties of associated Additive Kriging Models (AKM). Finally, AKM is compared with 
standard Kriging models on a well known test function: the Sobol's g- function |15) . 
It is shown within the latter example that AKM outperforms standard Kriging and 
produce similar performances as GAM. Due to its approximation performance and 
its built-in probabilistic framework, the proposed AKM appears as a serious and 
promising challenger for high-dimensional modeling. 
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2. Towards additive Kriging 

2.1. Additive random processes. — Lets first introduce the mathematical con- 
struction of an additive GP. A function / : D c R d — > R is additive when it can be 
written f(x) = X)i=i fii. x i)i where x, is the i-th component of the d-dimensional input 
vector x and the /j's are arbitrary univariate functions. Let us first consider two inde- 
pendent real- valued Gaussian processes Z\ and Zi defined over the same probability 
space (SI, J 7 , P) and indexed by R, so that their trajectories Zi(-\ uj) : t e R — >• Zi(t; ui) 
are univariate real-valued functions. Let Ki : R x R — > R be their respective covari- 
ance kernels and /ii,/i2 € R their means. Then, the process Z defined over (O, T, P) 
and indexed by R 2 , characterized by 

(2) e Q Vx e R 2 Z(x;w) = w) + Z 2 (x 2 ; w), 

clearly has additive paths and has mean fi = /ii+^2 and kernel K(x, y) — Ki_(xi, yx) + 
Kz(x2, J/2)- In this document, we call additive any kernel of the form K : (x, y) 6 R d x 
R d — > K(x,y) = Yli=i Ki(xi,yi) where the K^s are symmetric positive-semidefinite 
(s.p.) kernels over R x R. Although not commonly encountered in practice, it is well 
known that such a combination of s.p. kernels is also a s.p. kernel |13l [6j. Moreover, 
one can show that the paths of any random process with additive kernel are additive 
in a certain sens: 

Proposition 1. — Any (square integrable) random process Z x possessing an additive 
kernel is additive up to a modification. In essence, it means that there exists a process 
A x which paths are all additive, and such that Vx £ D, P{Z X — A x ) = 1. 

The proof of this property is given in appendix for d — 2. For d = n the proof 
follows the same pattern but the notations are more cumbersome. Note that the class 
of additive processes is not actually limited to processes with additive kernels. For 
example, let us consider Z\ and Z 2 two correlated Gaussian processes on (fl,J-, P) 
such that the couple (Z\, Z2) is Gaussian. Then Z\{x\) + ^2(2^2) is also a Gaussian 
process with additive paths but its kernel is not additive. However, the term additive 
process will always refer to GP with additive kernels in this article. 

2.2. Invertibility of covariance matrices. — As mentioned in [2] the covariance 
matrix K of the observations of an additive process Z at a design of experiments 
X = (x^ . . . x(™)) T may not be invertible even if there is no redundant point in 
X. Indeed, the additivity of Z may introduce linear relationships (that hold almost 
surely) between the observed values of Z and lead to the non invertibility of K. 
Figure Q] shows two examples of designs leading to a linear relationship between the 
observation. For the left panel, the additivity of Z implies that Z(x^) = Z(x^) + 
Z(x^) — Z(x^) a.s. so there is a linear relationship between the columns of K : 
K(x ( ^ , x ( V ) +K(x® , x ( V ) - K(x® ,x (i y)- K(x® , x^ ) = and therefore the matrix 
is not invertible. 
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Figure 1. 2-dimensional examples of DoE which lead to non-invertible 
covariance matrix when using additive kernels. In both cases, one point 
can be removed from the DoE without any loss of information. 



An approach which is in accordance with the aim of parsimonious evaluations 
of the simulator is to remove some points of the DoE in order to avoid any linear 
combination. Algebraic methods may be used for determining the subset of points 
leading to a the linear relationship. Indeed, the linear combination is given by the 
eigenvectors associated with the null eigenvalues, so the subset of points leading to 
the non invertibility of the covariance matrix can be obtained easily. However, the 
study of a procedure allowing to put aside unnecessary training points is out of the 
scope of this paper. 

2.3. Additive Kriging. — Let / : D — > R be the function of interest (a numerical 
simulator for example), where D C R d . The responses of / at the DoE X are noted 
F = (f(x^) ... f(x^)) T . Simple Kriging relies on the hypothesis that / is one path 
of a centered random process Z with kernel K. The expression of the best predictor 
(also called Kriging mean) and of the prediction variance are: 



(3) 



m(x) = E [Z(x)\ Z(X) = F] = k(xy K~ F 
v(x) = var [Z(x)\ Z(X) = F] = K(x, x) - fc(ar) r K _1 fc(:c) 



K(-,x^)~) and K is the covariance matrix of general 



K(x^ % \ x^). Note that these equations respectively correspond to the 



where fc(-) = (K{-,x { ^) 
term K^ 

conditional expectation and variance in the case of a GP with known kernel. In 
practice, the structure of K is supposed to be known (e.g. power-exponential or 
Matern families) but its parameters are unknown. A common way to estimate them 
is to maximize the likelihood of Z(X) = F [3 113] . 

In some cases, the evaluation of / includes an observation noise e. To take this into 
account in the expression of m and v correspond to the conditional expectetion and 
variance of Z knowing Z(X) + e(X) = F. If we assume that e is a Gaussian white 
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noise process with variance r 2 , we obtain: 

m(x) = E [Z{x)\ Z{X) + e{X) = F]= k(x) T {K + T 2 Id) _1 F 

(4) 

v(x) = vax[Z(x)\Z(X)+s(X) = F] = K(x, x) - k(x) T (K + r 2 Id) _1 A:(x). 

As we can see, the covariance matrix of e(X) appears in the expression of m and v. 
As we will use later, this remak is still valid when e(X) is a centered Gaussian vector. 

Equations [3] and [?] are valid for any s.p. kernel, so they can be applied with additive 
kernels. In this case, the additivity of the kernel implies the additivity of the Kriging 
mean so m can be split in a sum of univariate submodels mi, ... , m^. For example 
in dimension 2 with additive kernel K (x, y) = Ki(xi, y±) + K 2 (x 2l IJ2) we have 

m(x) = (fci(zi) + k 2 (x 2 )f(K 1 + K 2 )- 1 F 

(5) = M^i) T (Ki + K 2 )- 1 F + k 2 (x 2 ) T {K 1 + K 2 )- 1 F 
= mi(xi) + m 2 (x 2 ). 

Another interesting property concerns the variance: v can be null at points that do not 
belong to the DoE. Let us consider a two dimensional example where the DoE is com- 
posed of the 3 points represented on the left pannel of figure[TJ X = {x^ x^ x^}. 
Direct calculation (see Appendix B) shows that the prediction variance at the point 
x^ is equal to 0. This particularity follows from the fact that given the observations 
at X the value of the additive process at the point x^ is known almost surely. In 
the next section, we illustrate the potential of AKM on an a toy example. 

2.4. Illustration and further consideration on a 2D example. — We present 
here a first basic example of an additive Kriging model. We consider D = [0, l] 2 , and 
a set of 5 points in D where the value of the observations F are arbitrarily chosen. 
Figure [5] shows the obtained Kriging model. We can see on this figure the properties 
we mentioned above: the Kriging mean is an additive function and the prediction 
variance can be null for points that do no belong to the DoE. 

As we have seen in eq. [51 the expression of the first univariate model is 

(6) mi(n) = fc 1 (a; 1 ) T (K 1 +K2)- 1 F. 

It appears that the effect of the direction 2 can be seen as an observation noise. We 
thus get the following expression for the prediction variance 

(7) Vl { Xl ) = K x {x x ,xi) - ki(xi) T (Ki + KaJ-^iCsi). 

The left panel of figure [3] shows the submodel mi and the associated 95% confidence 
intervals. However, it appears that the confidence intervals are wide. This is because 
the submodels are define up to a constant. If we assume that J Zi{si)&Si exist a.s. [5], 
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Figure 2. Approximation of the function / based on five observations 
(black dots). The left panel represents the best predictor and the right 
panel the prediction variance. The kernel here is the additive squared- 
exponential kernel with parameters a = (1 1) and 8 = (0.6 0.6). 



E 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

x1 x1 

Figure 3. Univariate models of the 2-dimensional example. The left panel 
plots mi and the 95% confidence intervals ci(a:i) = mi(ii) ± 1\pv\ (a?i). 
The right panel shows the submodel of the centrated univariate effects fh\ 
and ci(a:i) = mi(xi) ± 2\Jv\(x\) 



we can get rid of the effect of such a translation by emulating Zi(xi) — J Zi(si)dsi 
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conditionally to the observations: 



rhi(xi) = E 



(8) 



Z i (x l ) - / Zi(si)ds 



rhi(xi) = var 



Zi{xi) - I Zi(si)ds 



Z(X) = F 



Z{X) = F 



The expression of rhi{xi) is straightforward whereas Vi{xi) requires more calculations 
given in Appendix C. 

fh^Xi) = m i {x l ) - / m^s^dsi 



(9) Vi{xi) = Vi(xi) 



K i (x i ,s i )ds l + 2 / k % (xi) T K 1 ki(s i )ds i 



Ki(si,ti)dsidti 



hiufK^k^dsAU 



The benefits of using rhi and Vi and then to define the submodels up to a constant 
can be seen on the right panel of figure [3] Furthermore, as the submodels fhi are 
univariate and centered, they may give a good approximation of the main effects of / 
with relevant confidence intervals. At the end, the probabilistic framework gives an 
insight on the error for the metamodel but also for each submodel. 



3. Kriging, high-dimensional input space and linear budget 

We will see in this section that additive Kriging models can outperform usual 
Kriging models when the dimension of the input space becomes large. The notion of 
high-dimensional input space can be interpreted differently depending on the context. 
In our case, we will consider that an input space is high-dimensional when its dimen- 
sion is larger than 10 and we will consider examples up to dimension 50. This exclude 
simulators for which one of the input is a picture or a map (for example groundwater 
flow simulators depending on permeability and porosity maps) where it is not unusual 
to deal with 50000-dimensional input spaces. 

Most of the time, kernels used in computer experiment are power exponential 
or Matern kernels |13j . For those kernels and for all other stationary kernels such 
that limii 3; _j / ii_>+ 0O K {x, y) = 0, an observation at a point x\ of the DoE has only 
a local influence on the emulator. This implies that the number of points required 
for modeling accurately a function increases exponentially with the dimension d of 
the input space. However, large training sets are rather inconsistent with the context 
of emulating a costly-to-evaluate function and in contrast, a total budget of 10 x d 
evaluations is sometimes advocated |10j . We now illustrate with an example that 
usual separable kernels are not appropriate for emulating high-dimensional functions 
for this budget of evaluation whereas additive kernels can advantageously be used to 
extract an additive trend. 
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Let Z be a centered Gaussian process over [0, l] d with unit variance and an isotropic 
squared-exponential kernel 



(10) *(*,») =n«p(-^v^V 

i=i \ ' 

Let A" be a LH design of size 10 x d. Our aim is to investigate the reduction of variance 
obtained by conditioning Z with respect to the observations X when d increases. In 
order to quantify the proportion of variance explained by the emulator, we consider a 
test set X t = (x^, ■ ■ ■ , xj™*^ drawn from uniform distribution and we compute the 
following criterion 

£r=ivar(E ( Z {xf)\Z{X) 

(11) P=l ^ 



According to the law of total variance, we have for all i 

(12) var(z(4 l) )) = var (e (z^^iX))) +E (var (z{ x f ] )\Z{X)Yj 

so the values of P arc in [0, 1]. As for a Q2 criterion (see eq. ITS]) , a value P = 1 
implies that Z y x i?J i s known a.s. for all test points whereas P = indicates that 
E (Z(-)\Y(X)) is no more predictive than E (Z(-)). As P do not take into account the 
distance between m and the function to fit and as it priviledges overconfident models, 
this criteria is not ment to assess the quality of a GP emulator. However, it is well 
suited for studying the prediction ability of a GP emulator. 

As shown on figure [H the proportion of explained variance collapses when the di- 
mension increases, and this fall is all the more important as the range parameter 
is small. When the value of the range parameter 9 is lower than half of the range 
of the data, simple or ordinary Kriging models with usual separable covariance are 
inappropriate to emulate high-dimensional functions for a budget of 10 x d observa- 
tions. However, further tests showed that such budget allows to build very predictive 
GP emulator up to d = 100 when 8 = y/d. 

We will now consider a second example where the GP to be approximated has 
an additive component and compare the results of additive and non additive Kriging 
emulators. Let Ya and Ys be independant centered GPs indexed by [0,l] d with 
respectively an additive and a separable kernel: 

v ( \ 1 ( {Xi-Vi) 
K A (x,y) = ~^e W { ^ 

»=i v 
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Figure 4. Proportion of variance explained by Z\Z(X) versus dimension. 
The P criteria is computed for n t = 10000 test points uniformly distributed 
on [0, l] d . The 3 curves correspond to different values of the range param- 
eter 8. 



We define Y as Y = Ya + Ys so that the first half of the variance of Y is explained by 
its additive part Za and the second one by its separable part Zg- We now compare 
the predictivity of 2 emulators: 

m A {x) = E(Y A (x)\Y A (X) + Y S (X)) = k A (xY(K A + Ks) -1 ^*) + Y S (X)) 

( 14 ) 

m s (x) = E(Y s (x)\Y A (X) + Y S {X)) = fc s (x)*(K A + K S )-\Y A (X) + Y s (Xj). 

As we have seen previously, ttia corresponds to the best predictor of an additive Krig- 
ing model with an observation noise given by K5. This emulator cannot explain the 
non additive part of Y . Reciprocally, ms is based on the separable kernel Ks with 
an observation noise K^. This term may be able to cover both the additive and non 
additive part of Y for a large number of observations. The prediction variance associ- 
ated to those emulators is known analytically, so their predictivity can be compared 
as in the previous example. We observe on figure [5] that the explained variance falls 
quickly to when using a separable kernel whereas an emulator based on an additive 
kernel can capture efficiently the additive trend of the phenomena. On this example, 
and for a budget of 10 X d evaluations, it appears that Kriging additive models clearly 
outperforms Kriging based on standard kernels. 
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Figure 5. Comparison of the predictivity of the approximation of Y by 
vtla and ms- 



4. Application to the g-function of Sobol 

In order to illustrate the methodology and to compare it to existing algorithms, 
an analytical test case is considered. The function to approximate is the g-function 
of Sobol denned over [0, l] d by 



> 



(15) g(x) = — with a k 

f- 1 ; l + a k 

k—l 

This popular function in the literature |15j is obviously not additive. However, de- 
pending on the coefficients ak, g can be very close to an additive function. As a rule, 
the g-function is all the more additive as the a k are large. One main advantage for 
our study is that the Sobol sensitivity indices can be obtained analytically so we can 
quantify the degree of additivity of the test function. For i = 1, . . . , d the indice Si 
associated to the variables Xi is 

(16) S t = 3(1+la '^ . 

Ilfc=l 1 + 3(l+a ft ) 2 _ 1 



Here, we impose that the value of the parameters a k is the same for all directions (ie 
Vfc, ctfc = ai). As the additivity of the g-function is tunable, we choose ai such that 
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the variance of the additive part of g correspond to 75% of the variance of g: 
d 

(17) V Si = 0.75 d- —j = 0.75 with 



(l + u) d -l 3(1 + ai) 2 

Eventually, the value of a\ can be obtained by finding the zeros of a polynomial in u. 
Note that different values for d leads to different values of a\ . 

For d <E {5, 10, 20, 30} and a Latin hypercube design based on 10 x d points, we 
compare an Usual Kriging Model (UKM) with AKM and GAM. The two Kriging 
models are ordinary Kriging models since they include a constant term as a trend. 
As GAM is based on smoothing cubic splines, we choose a Matern 5/2 kernel with 
observation noise for the Kriging models so as the different models have a similar 
regularity. The results for UKM and GAM are obtained with the DiceKriging 114] 
and the GAM [8] R packages available on the CRAN [18]. For AKM and UKM 
the three parameters of the kernels (a 2 , 9, r 2 ) are obtained using maximum likelihood 
estimation [131 1 1 6] . To asses the quality of the obtained metamodels, the predictivity 
coefficient Q2 is computed on a test sample of n t = 1000 points uniformly distributed 
over [0, l] d : 

(18) Q 2 (y,y) = l- ^J^-^l 

where y is the vector of the values at the test points, y is the vector of predicted 
values and y is the mean of y. 

As the parameter estimation accuracy and the overall quality of an emulator are likely 
to fluctuate with the DoE, we repeated 50 times each emulator's building and testing 
for various DoE. The results are presented in figure[Sl Conversely to what we observed 
in section [3J the predictivity of the Kriging model based on a separable kernel does 
not fall to zero when the dimension increases. As we impose the additive part of g 
to explain 75% of its variance, the value of the coefficient a\ is increasing with d and 
the g-function becomes smoother. As a result, the range parameter 9 increases with 
d (we have 9 w 0.5 for d = 5 and 9 « 2 for d = 30) so the predictivity of the models 
based on separable kernels do not fall to zero as previously. 

In order to illustrate the increasing smoothness of <?, we represent the univariate 
submodels fh\ for various values of d (fig. [7]). Even if the observation points do not 
show any obvious trend, the submodels are close to the analytical main effects. 



5. Concluding remarks 

The proposed methodology seems to be a good challenger for additive modeling. 
On the first example, additive models appears to be well suited for high-dimensional 
modeling with a DoE budget of 10 x d whereas Kriging models based on standard ker- 
nels fail to recover the function to approximate. One important result is that additive 
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Figure 6. Boxplots of the predictivity coefficients Q2 for three emulators: 
Usual Kriging Model (UKM), Additive Kriging Model (AKM) and GAM. 
For a given boxplot, the variability is due to the choice of the DoE which 
is repeated 50 times. 



kriging models succeed to extract the additive trend of the function to approximate 
even if this function is not purely additive. 

The proposed additive models take advantage of additivity, while taking advantage 
from GP features. For the first point we can cite the complexity reduction and the 
interpretability of additive models. For the second, the main asset is that GP models 
include a prediction variance for the model but also for each submodel. This justifies 
the fact of modeling an additive function on H d instead of building d metamodels over 
R since the prediction variance is not additive. At the end, the proposed methodology 
is fully compatible with Kriging-based methods and its versatile applications. For 
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(a) d = 10 (b) d = 30 (c) d = 50 



Figure 7. Representation of the univariate submodels mi (cci) (solid lines) 
for three additive Kriging models. As a comparison, the analytical main 
effects are given by the dashed lines. The bullets denote the centered 
observation points. 

example, one can choose a well suited kernel for the function to approximate or use 
additive Kriging for high-dimensional optimization strategies relying on the expecting 
improvement criteria. 

In this article, we only considered isotropic kernels. As for separable kernel, the 
use of additive kernels can easily be extended to anisotropic kernels (ie one range 
parameter 0, per direction) but additive kernels also allow to define one variance 
parameter of per direction. This feature, which is not possible for separable kernels, 
can enable additive models to approximate functions for which the variance depends 
on the direction. However, the total number of parameters would be 2d + 1 and the 
practicability of their estimation deserves to be studied in detail. 
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Appendix A: Proof of proposition Q] for d = 2 

Let Z be a centered random process indexed by K 2 with covariance kernel K(x,y) = 
K\{x\, yi)+K 2 (x 2 ,y 2 ), and Zt the random process defined by Zt(x\, x 2 ) = Z(x\, 0)+ 
Z(0,X2) — Z(0, 0). By construction, the paths of Zt are additive functions. In order 
to show the additivity of the paths of Z, we will show that Vir <E R 2 , P(Z(x) = 
Zt{x)) = 1. For the sake of simplicity, the three terms of vax[Z(x) — Zt(x)} = 
vax[Z(x)] + v&i[Zt(x)] — 2cov[Z(x), Zt{x)\ are studied separately: 

vax[Z(x)] = K(x, x) 
var[Z T (x)] = vax[Z(xi, 0) + Z(0, x 2 ) - Z{0, 0)] 

= v&r[Z(xi, 0)] + var[Z(0, x 2 )} + 2cov[Z(xi, 0), Z(0, x 2 )] 

+ var[Z(0, 0)] - 2cov[Z(x u 0),Z(0, 0)] - 2cov[Z(0, x 2 ),Z{0, 0)] 
= K 1 {x l ,x l ) + K 2 (0, 0) + Ki(0, 0) + K 2 {x 2 ,x 2 ) + K(0, 0) 
+ 2 (#i(a;i,0) + K 2 (0, x 2 )) - 2 {Kx{x u 0) + K 2 {0, 0)) 
-2(K 1 {Q,0) + K 2 {x 2 ,{))) 
= K\{xi,xx) + K 2 (x 2 ,x 2 ) = K(x,x) 
cov[Z(x), Z T {x)\ = cov[Z{ Xl ,x 2 ), Z(x u 0) + Z(0, x 2 ) - Z(0, 0)] 

= Ki (a:i , xt ) + K 2 (x 2 , 0) + K x ( Xl , 0) + K 2 (x 2 , x 2 ) 

-K 1 (x 1 ,0)-K 2 (x 2 ,0) 
= Ki(xi,xi) + K 2 (x 2 , x 2 ) = K(x,x) 
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Those three equations implies that var[Z(x) — Zt(x)} = 0, Vx € R 2 . As E[Z(x) — 
Zt{x)] — 0, we have P(Z(x) = Zt{x)) = 1 so there exists a modification of Z with 
additive paths. 



Appendix B: Calculation of the prediction variance 

Let consider a DoE composed of the 3 points {pP-> x^> x^} represented on the 
left pannel of figure [T] We want here to show that although x^ does not belongs to 
the DoE we have v(x^) = 0. 



v(x w ) 



(4h _ 



K{x^\x^)-k{x^) T K- 1 k{x^) 

K{x^\x^) - (k(x^) + fc(x^) - kix^fK-'Hx^) 

Kl (x[ 4 \x^)+K 2 (x?\x^)- 



■1 1 1) 



\Ki{x[ 



(2) J*h 



(3) J*h 



K 2 (x. 
K 2 (x. 



(2) _(4)a 
2 ) x 2 / 

(3) „(4)> 



r ^ / (2) (2)n . r ^ / (3) (3)\ / (2) (2)n / (3) (3)\ 

Jfi(xi ',x\ ')+K 2 (x 2 \x 2 ') - Ki{x\ \x\ ') - i^ 2 (4 \x\ ') 




Appendix C: Calculation of Vi 

We want here to calculate the variance of Zj(xj) — / Zi(sj)dsi conditionally to the 
observations Y. 



Vi(xi) = var 



Zi(xi) - I Zi(si)ds 



Z(X) = Y 



var [Zi{xi)\ Z(X) =Y]- 2cov 



Zi(xi), / Zi(si)ds q 



Z(X) = Y 



Zi(si)ds, 



Z(X) = Y 



+ Kiis^t^dsidU- // h{t t ) T 'K^his^dsidti 
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